A Language-Neutral Sparse-Data Algorithm for Extracting Translation Patterns
نویسندگان
چکیده
In this paper, we present an algorithm for the automatic extraction of translation patterns between two (Indo-)European languages. These consist of possibly discontiguous text fragments, with the bilingual relationship between the text fragments and the discontinuities between them made explicit. The patterns are extracted from a bilingual parallel corpus aligned at the sentence level, without the need for linguistic analysis, and are used to build a translation memory database which is intended for use in a machine aided human translation (MAHT) setting, such as a translator’s workbench (TWB). The patterns extracted could also form the basis for example-based machine translation (EBMT) without the need for complex linguistic or statistical processing. Given a TM database made up of our concept of translation patterns and a SL input string, relevant translation patterns combine to form TL translations as suggestions to the translator. We evaluate the accuracy of the translation patterns extracted along with the quality of translations produced.
منابع مشابه
Linguistic Knowledge and Complexity in an EBMT System Based on Translation Patterns
An approach to Example-Based Machine Translation is presented which operates by extracting translation patterns from a bilingual corpus aligned at the level of the sentence. This is carried out using a language-neutral recursive machine-learning algorithm based on the principle of similar distributions of strings. The translation patterns extracted represent generalisations of sentences that ar...
متن کاملProbabilistic Inference for Machine Translation
We advance the state-of-the-art for discriminatively trained machine translation systems by presenting novel probabilistic inference and search methods for synchronous grammars. By approximating the intractable space of all candidate translations produced by intersecting an ngram language model with a synchronous grammar, we are able to train and decode models incorporating millions of sparse, ...
متن کاملTranslation Pattern Extraction and Recombination for Example-Based Machine Translation
An approach to Example-Based Machine Translation is presented which operates by extracting and recombining translation patterns from a bilingual corpus aligned at the level of the sentence. The translation patterns are extracted using a recursive machinelearning algorithm based on the principle of similar distributions of strings: source and target language lexical items that co-occur in the sa...
متن کاملبهبود کارایی طبقهبندیکننده مبتنی بر نمایش تنک برای طبقهبندی سیگنالهای مغزی
In this paper, the problem of classification of motor imagery EEG signals using a sparse representation-based classifier is considered. Designing a powerful dictionary matrix, i.e. extracting proper features, is an important issue in such a classifier. Due to its high performance, the Common Spatial Patterns (CSP) algorithm is widely used for this purpose in the BCI systems. The main disadvanta...
متن کاملLearning redundant dictionaries with translation invariance property: the MoTIF algorithm
Sparse approximation using redundant dictionaries is an efficient tool for many applications in the field of signal processing. The performances largely depend on the adaptation of the dictionary to the signal to decompose. As the statistical dependencies are most of the time not obvious in natural highdimensional data, learning fundamental patterns is an alternative to analytical design of bas...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999